home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Chip 1997 March
/
CHIP Mart 1997.iso
/
SesProg
/
VDIGIT.ZIP
/
VOICEKIT.DOC
< prev
next >
Wrap
Text File
|
1989-06-24
|
31KB
|
615 lines
Digitized Voice Programmer's Toolkit for the PC
-----------------------------------------------
Version 1.0
Copyright (c) 1988,1989, Farpoint Software
* * * * * * * * * * * * * * * * *
**************************************************************************
* *
* To those of you who have HIDI.ARC and/or DIGITS.ARC, welcome back. *
* This new release will serve as a major upgrade to things you already *
* have. *
* *
**************************************************************************
Introduction
------------
This toolkit is a combination of software and hardware designed for the
purpose of mechanizing and simplifying the process by which programmers may
create digitized voice recordings, store them on disk, edit the voice data
files, and incorporate digitized voice playback into their own high-level
language programs.
The recording of digitized voice requires a small, inexpensive hardware device
to be built. Schematics and printed circuit board layout files are provided
for this device.
Playback of the digitized voice, however, requires NO SPECIAL HARDWARE. The
sound is produced with the built-in speaker provided in nearly all PC's and
PC-compatible machines. This means that programs may be written for general
distribution which will play voice messages on the user's machine as it
exists.
Here is a list of the major features of the current software package:
(1) Operates under the DOS environment.
(2) Provides a full set of voice record/playback control routines which
are directly callable from many high-level languages including C
and Pascal. They are also of course callable from assembly language.
(3) All voice operations proceed IN THE BACKGROUND. The control routines
return to the caller immediately, and voice playback occurs
simultaneously with the continuing execution of the main program.
The main program may call a status routine at any time to check on
the progress of the voice playback.
(4) There are no length limitations on either the size of the memory
buffers or the size of the voice data files on disk other than the
physical limits of the machine itself. 64k is not a special number.
(5) A sophisticated voice data file editor is provided. This gives the
programmer a set of capabilities similar to those available on a
conventional tape recorder. Position markers, live overwriting,
selective erasure, cut-and-paste, and assorted other features make
the produciton of "refined" voice files an easy task.
(6) Several short example programs are included, written in both C and
assembly language, which demonstrate the use of the calls to the
voice modules. There is even an example of a memory-resident program
which detects the pressing of the left shift key and plays a short
voice message when this occurs. (Foreground processing continues
undisturbed.)
Shareware Notice
----------------
The Digitized Voice Programmer's Toolkit is released as Shareware. This is
copyrighted material; it is NOT "free software". You are permitted to
experiment with this package long enough to determine if it suits your needs,
but if you will be making use of the material in your own programs, then a
license fee of $50 is required. NO PROGRAM WHICH MAKES USE OF THE MATERIALS
IN THIS TOOLKIT MAY BE SOLD COMMERCIALLY OR ON A CONTRACT BASIS UNLESS THE
SELLER HAS PAID THE LICENSE FEE. Please make the check or money order payable
to:
Farpoint Software
2501 Afton Court
League City, Texas 77573
For convenience, a registration form is included in the file REGISTER.FRM.
As a registered user, you will receive updates automatically long before they
are released to BBS's. You will also receive a copy of the source code to the
VDFE editor. Registered users, of course, are given higher priority if
programming assistance or hardware construction assistance is requested.
You are granted permission to distribute copies of the Digitized Voice
Programmer's Toolkit, provided that (1) no fee is charged for such copies,
other that a nominal disk duplication fee, (2) these files are distributed
in their original, unmodified form, and (3) ALL the files in the original
archive are included with each copy. (See "List of Files" below.)
If you paid a "disk duplication fee" or other such fee to a distributor of
public domain and shareware programs, be aware that the payment of this fee
DOES NOT constitute registration of this Toolkit. Likewise, the payment of a
fee to any Bulletin Board Service for the time required to download this
Toolkit DOES NOT constitute registration. Registration occurs only through
direct interaction with Farpoint Software.
If more information is needed, write or contact Alan D. Jones through
Compuserve Information Service at user ID 74030,554.
List of Files
-------------
The files included with the Digitized Voice Programmer's Toolkit are:
BIN2ASM
BIN2ASM.C
BIN2ASM.EXE
EMBEDDED
EMBEDDED.C
EMBEDDED.EXE
EVM.PRE
EVM.SUF
EVM.VOI
LONGTEST.VOI
README.1ST
REGISTER.FRM
RUN_ME.BAT
TSR
TSR.ASM
TSR.EXE
TSRVM.PRE
TSRVM.SUF
TSRVM.VOI
VDFE.EXE
VMSCH.HPP
VOICEKIT.DOC
VPMOD.ASM
VPMOD.DOC
VPMOD.H
VPMOD.OBJ
VPTEST
VPTEST.C
VPTEST.EXE
VRMOD.ASM
VRMOD.DOC
VRMOD.H
VRMOD.OBJ
VRTEST
VRTEST.C
VRTEST.EXE
If you received the Toolkit with any of the above files missing, please
notify Farpoint Software.
Description of Voice Subroutine Modules
---------------------------------------
The key software elements in the kit are two assembly language programs,
VRMOD.ASM and VPMOD.ASM, and their assembled OBJ files. These are not stand-
alone programs. They are designed to be linked with other programs to provide
the voice control routines. The calls associated with recording are in
VRMOD, and the calls associated with playback are in VPMOD. Any given program
may be linked with either or both of these modules. Typically, a program
designed for general distribution would be linked only with VPMOD, since
recording requires the hardware device.
The external hooks to the two modules consist of various "public" procedure
names. All procedures use the Pascal calling convention, since most high-level
language compilers can support this calling method. The Pascal calling
convention has the following meaning:
(1) Procedure names are all caps, and are not preceeded by an underscore.
(2) Procedures are called with "far" (intersegment) calls.
(3) Short return values appear in the AX register; long return values
appear in DX:AX.
(4) Parameters are pushed onto the stack in left-to-right order; i.e. the
first parameter in the list is pushed first. If the parameter is a
doubleword, then the high order word is pushed first.
(5) The called subroutine is responsible for clearing the parameters from
the stack upon return.
The above list will be of interest primarily to assembly language programmers.
When working in a high-level language, it is necessary only to make sure that
the compiler is using the proper calling method. For C programs, two header
files have been included. They are VRMOD.H and VPMOD.H. At the beginning of
any C program which is to use the voice playback routines, insert the line:
#include "vpmod.h"
This file contains prototypes of all procedure calls in VPMOD.ASM, declared
in a way that causes the compiler to generate correct calling code.
The details of how each individual procedure call operates will be found in
the separate documents VRMOD.DOC and VPMOD.DOC. It is suggested that you
print these files for use as reference material while writing programs.
It is possible to link both VRMOD.OBJ and VPMOD.OBJ to the same program, but
you should NOT have both packages initialized at the same time. Each package
assumes "ownership" of timer channel zero, and this would cause a conflict
over the setting of the hardware timer interval, not to mention the problem
of possible insufficient CPU time to execute both interrupt routines at every
timer tick (at 16500 Hz). The solution here is (1) never attempt to record and
play back at the same time, and (2) don't call PVOICE_INIT until playback is
ready to begin and be sure to call PVOICE_CLEANUP immediately after playback
ends. (Similar rules apply to recording.)
Example Programs
----------------
Note: "Make" files acceptable to Microsoft's Make utility are included
for all the example programs. The compiler used was the Microsoft
C Compiler version 5.10. The assembler was the Microsoft Macro
Assembler version 5.10. The make files are written to assume that
the compiler is installed to include the Large model library and
that the default operating system is DOS. If the compiler defaults
to the OS/2 operating system, then change the make files so that
all occurrences of "llibce" become "llibcer".
VRTEST.C (VRTEST.EXE):
[Related files: VRTEST]
This program works like RECORD.COM provided with the first voice
digitization package released in 1988. It demonstrates the use of
all the procedure calls and features in VRMOD. To execute the program,
first attach the voice recording circuit to a COM port, then at the
DOS prompt type: VRTEST 1 TESTFILE.VOI. If you are using COM2, then
substitute "2" for the "1". The filename "TESTFILE.VOI" may be any filename.
Recording will begin and messages will scroll on the screen indicating the
number of bytes of data recorded. Writing to the file will be performed
"on the fly". The memory buffer size is currenly set to 16k, but may be
changed by editing and recompiling the program. Recording will continue
until either the <Esc> key is pressed or the disk is full. The size of the
memory buffer should be at least 8k, but beyond this point it is actually
irrelevant as long as calls to RVOICE_CATCHUP are made frequently enough
(which means at least once every 3 seconds).
VPTEST.C (VPTEST.EXE):
[Related files: VPTEST]
This is the counterpart to VRTEST. It demonstrates the use of all the
procedure calls in VPMOD. As in VRTEST, the memory buffer is currently 16k
but may be changed by editing and recompiling. The command line to execute
the program is VPTEST TESTFILE.VOI, where "TESTFILE.VOI" is the name of
a file containing voice data. The reading of the file will occur as needed
to keep the buffer full or until all bytes have been read. The size of
the memory buffer needs to be increased beyond 8K only if it is not possible
to call PVOICE_CATCHUP at least once every 3 seconds. (Note that it may
also be advisable to increase the buffer size if the file is being read
from a floppy disk, since accesses may be quite slow.)
EMBEDDED.C (EMBEDDED.EXE):
[Related files: EMBEDDED, EVM.VOI, EVM.PRE, EVM.SUF]
This is a simple example of the techniques used to embed voice data in an
executable program. Instead of reading a separate voice file, the voice
data is part of the EXE file. Note that the "make" file in this case is
as important to study as the C program. The trick here is to convert the
raw binary voice data file into an OBJ file that we can feed through the
linker. This is done in three stages: (1) The file-cruncher program BIN2ASM
is used to create a file containing only a long list of assembly language
DB statements equivalent to the binary data; (2) The prefix file EVM.PRE
and the suffix file EVM.SUF are combined with the DB statements to form
an assembly language module containing all necessary segment brackets and
public declarations; (3) This module is assembled and linked with the main
program. The content of the prefix and suffix files depend on the specific
application; in this example we use only a single segment and a single
block of voice data. A more complex program may contain several modules of
this type or have an assortment of labels within a single module. Since the
assembler requires segments to be 64k or less, BIN2ASM places a marker
comment (a semicolon and a string of minus signs) at each 64k boundary in
its output file. If this happens, you must edit the file to end a segment
and begin a new one at each of these boundaries.
TSR.ASM (TSR.EXE):
[Related files: TSR, TSRVM.VOI, TSRVM.PRE, TSRVM.SUF]
This serves as both an example of a pure assembly language program using
VPMOD and a technique for including voice playback in a memory-resident
program. The voice data is embedded in the EXE file in the same way as it
was done in EMBEDDED.EXE above. Otherwise, the program is fairly
conventional. There is one major caution to observe, however: since a
memory-resident program may play voice concurrently with the execution of
another unknown program, don't set the file read flag (in PVOICE_START)
to 1 and don't use PVOICE_CATCHUP! Use of the "read-on-the-fly" feature
of the voice control routines calls DOS to read the disk. If a DOS call
is made within an interrupt service routine (especially a timer tick
routine), the interrupt may have occurred while a DOS call was already in
progress. In this case, DOS will be "re-entered", and it is NOT re-entrant.
Doing this will almost certainly cause a system crash.
If you are already familiar with the above problem, and have worked out a
system of calling DOS in the background during its "safe" moments, then
you probably will be able to use read-on-the-fly. Always call PVOICE_START,
PVOICE_INIT, PVOICE_CLEANUP, and PVOICE_CATCHUP during "safe" times. Also,
remember that timer interrupts will now be happening at about 16500 Hz, so
make sure that your program never disables interrupts for more that a very
short time. (One more thing: if you must hook INT 8, do it BEFORE calling
PVOICE_INIT.)
The Voice Data File Editor (VDFE)
---------------------------------
This program provides a convenient environment for creating, editing, and
generally patching together voice data files. Its function resembles that of
a tape recorder. It edits files only within its RAM buffer, which is limited
by the amount of memory on the machine available to DOS. On a 640k machine,
this translates to about 470k of buffer space, or 225 seconds (3 minutes and
45 seconds) of continuous sound. If you need to edit nonstop chunks of voice
data longer than that, they will have to be edited piecemeal and concatenated
afterward. (Of course, multi-megabyte voice data files may be recorded using
VRTEST or a similar program. If it turns out that people really need to edit
super-long files on a regular basis, I will include infinite-file-length
editing on a future release.)
VDFE requires no command line parameters. Upon execution, it displays its
primary screen and waits for user input. This consists primarily of single
keystroke commands, which are hereby documented in some detail:
<Up arrow> and <Down arrow>:
These are used to scroll the contents of the Operating Instructions window
in the lower right area of the screen. The window displays one-line
descriptions of all the keystroke commands.
<F1>:
Displays an information screen which briefly describes the purpose and
operation of VDFE.
<Esc>:
Exits to DOS. If the contents of the editing buffer have been altered since
the last save to disk, the user is asked to confirm the exit command.
<F2>:
Increments the COM port number shown at the left side of the screen. This
will be the port used for recording. Press <F2> repeatedly until the desired
port number shows.
<F3>:
Requests a file name, then loads the file into the edit buffer starting at
offset zero. The end-of-file position will be set to match the length of the
file. If the specified file does not exist, the user will be asked whether
to create the file. If the answer is "yes", then a zero-length file is
created and the end-of-file position is set to zero. The actual data in the
edit buffer remains unchanged.
<Alt F3>:
Requests the entry of a new file name. This becomes the current file name as
shown at the left side of the screen. Nothing is done with this name
immediately. The new file name will be used in subsequent "save current
data" (<F4>)operations.
<F4>:
Saves current data. The current filename is opened and truncated, and the
contents of the edit buffer from offset zero to the offset shown as the
end-of-file are written to the file.
<Space bar>:
The "Stop" button. If a record or playback operation is in progress, it is
stopped.
<Enter>:
The "play" button. The contents of the edit buffer are played back through
the speaker starting from the current position. Playback ends at the
end-of-file position. If the current position is greater than or equal to
the end-of-file position, playback will not occur.
<Insert>:
The "record" button. Digitized voice is input through the selected COM port
and written into the edit buffer. Writing begins at the current position,
overwriting existing data. Recording can be stopped by pressing <Space>,
<Enter>, or any key which normally has the function of changing the current
position. If the current position during recording exceeds the end-of-file
position, then the end-of-file position is moved forward continuously to
match the current position. If the current position reaches the end of the
edit buffer, then wrap-around will occur, causing recording to continue at
offset zero.
<Left arrow>:
Medium-speed rewind. The current position will be decremented by 256, which
corresponds to about 1/8 second of voice time.
<Right arrow>:
Medium-speed forward. The current position will be incrememted by 256.
<Ctrl left arrow>:
Fine rewind. The current position will be decremented by 1 byte.
<Ctrl right arrow>:
Fine forward. The current position will be incremented by 1 byte.
<Page Up>:
High-speed rewind. The current position will be decremented by 8192, which
corresponds to about 4 seconds of voice time.
<Page Down>:
High-speed forward. The current position will be incremented by 8192.
<Home>:
The current position is set to zero.
<End>:
The current position is set to match the end-of-file position.
<Ctrl end>:
The end-of-file position is set to match the current position.
<0> through <9>:
Set marker. There are 10 markers, numbered 0 through 9. Each marker consists
of a slot in which a "current position" may be stored. Any time a digit key
is pressed, regardless of the stopped/playing/recording state, the current
position at that instant is copied into the corresponding marker. The marker
values are displayed in a window in the lower left area of the screen.
<Alt 0> through <Alt 9>:
Pressing a digit key (on the main section of the keyboard, NOT the numeric
keypad) while holding the <Alt> key down causes the current position to
change to match the value stored in the corresponding marker.
<F5> and <F6>:
Change the marker numbers which are assigned the "begin" and "end" flags.
In the left column of the marker window, two of the marker number positions
always contain 'beg' and 'end' rather than a digit. These are the ones used
in any operation that refers to a "marked section". Initially, marker 0 is
the "begin" marker and marker 1 is the "end". Press <F5> repeatedly to move
the 'beg' to the desired marker. Press <F6> repeatedly to move the 'end' to
the desired marker. The two flags are not allowed to be assigned to the same
marker.
<Tab>:
Sets the current position to match the "begin" marker and initiates a
playback operation which will terminate at the "end" marker.
<F7>:
A filename is requested from the user. The contents of the marked section of
the edit buffer are written to this file. If the file already exists, it
will be overwritten. The current filename remains unchanged.
<F8>:
A filename is requested from the user. The contents of this file are copied
into the edit buffer starting at the "begin" marker. The "end" marker is
changed to reflect the size of the file. The current filename remains
unchanged.
<F9>:
The marked section will be erased (filled with zeros).
<F10>:
This causes the editor to enter a mode in which text may be typed into the
column of the marker window titled "comments". These are simply reference
notes and have no effect on the operation of the editor. The comment entry
mode is exited by pressing the <Esc> key.
Graphical Print Files
---------------------
These files are prepared for output to an HP LaserJet Plus printer with the
minimum memory configuration (512k). To print one of the files, use
"COPY /B <filename> LPT1:" (or LPT2 if appropriate). The following lists the
contents of each file:
Filename Density Description
-------- ------- -----------
VMSCH.HPP 150 dpi The schematic to the Digitizer.
VMPCB.125 300 dpi A positive print of the "copper side" of
a single-sided circuit board implementing
the Digitizer, suitable for photo-reduction
to board manufacturing negatives. Scale is
1.250, producing the largest image that will
fit in the LaserJet 512k memory.
VMSLK.125 300 dpi A positive print of a silkscreen component
placement guide for the component side of
the board. This may be either silkscreened
onto the board or simply printed out and
referred to while building the board. Scale
is 1.250.
VMDRL.125 300 dpi A drilling guide for use in making numeric-
control tool tapes with a digitizing pad.
This print will not be of much use to those
who will be drilling the holes by hand.
Scale is 1.250.
VMPCB.100 300 dpi A duplicate of VMPCB.125, but scaled 1:1 for
use with contact-print or direct transfer
methods of producing the negatives.
VMSLK.100 300 dpi A duplicate of VMSLK.125, scaled 1:1.
VMDRL.100 300 dpi A duplicate of VMDRL.125, scaled 1:1.
Due to the large size of the printed circuit board files, and the probability
that most users will not actually want to manufacture a board for this
device, these files are placed in a separate archive. Only the schematic,
VMSCH.HPP, is included in this archive.
All of these plots are available to registered users formatted for output on
a variety of other printers and pen plotters (photoplotters also). Contact
Farpoint Software at the address / CIS number shown in the Shareware Notice
section of this document.
Schematic Notes
---------------
The circuit is designed to operate from two 9-volt batteries connected to J1
and J2. The original circuit used a single-ended supply. This modification
requires fewer parts and produces the correct RS-232 voltages at the output.
Pad resistors have been added to the trimpot. This control in the original
version was somewhat difficult to adjust. The pad resistors decrease the
sensitivity of this control enough to allow a 1-turn potentiometer to be used,
thus reducing the length of the "hunt" for the proper position.
If your serial port uses a DB-9 connector, the cable from J4 is:
J4 pin 1 -------- DB-9 pin 5 (Ground)
J4 pin 2 -------- DB-9 pin 8 (CTS)
If your serial port uses a DB-25 connector, the cable from J4 is:
J4 pin 1 -------- DB-25 pin 7 (Ground)
J4 pin 2 -------- DB-25 pin 5 (CTS)
The circuit consists of two stages of voltage amplification with some
high-pass filtering built into the coupling capacitors, followed by a
differentiator. The output of the differentiator is fed to a voltage
comparator, thus producing an output which has approximately the following
relationship to the input from the microphone: If the derivative of the
speech waveform if positive, then the output is logic zero; If the derivative
of the speech waveform is negative, then the output is logic one. The
transition timing at the output is entirely analog in nature; there is no
synchronizing clock signal anywhere in the circuit.
If the output of this circuit is connected directly to a speaker, the
resulting sound will still be an understandable version of the input. Since
the output consists of nothing but a digital bit stream, the job of the
computer becomes that of simply recording and accurately reproducing this bit
stream.
The trimpot at the input of amplifier U3 is used to set the DC idle voltage
output from the differentiator to somewhere near the threshold of comparator
U4. There will be a considerable amount of noise at the output of U3,
originating at the microphone and within the input circuitry of U1, and
highly amplified by U1 and U2. The trimpot should be adjusted so that the
comparator threshold is just outside the normal excursion of the noise signal
("off to one side"), otherwise "silence" at the microphone will become, at
the speaker output from the computer, a loud hiss with a strong component at
half the sampling frequency.
I used LF356's for U1, U2, and U3, and an LM393 for U4. All amplifiers should
have power supply bypass capacitors (not shown). The microphone is a 600 ohm
dynamic type. The +-12 volt power supply should be quiet and well-regulated;
the one in the PC is too noisy unless you use heavy filtering. Power supply
bypassing consists of attaching capacitors in the 0.1 uF range (up to 1 uF is
ok) DIRECTLY across the power supply pins of each amplifier chip. Layout is
important here. The capacitors should use the shortest possible wire length
to the pins of the chips. There will be 8 caps required: one from +12 to
ground and one from -12 to ground for each chip. If you use dual or quad
amplifier chips instead of the LF356's, then of course only one set of caps
is required per actual chip. The purpose of the bypass caps is to provide a
highly localized low-impedance power source at each chip to prevent unwanted
positive feedback through the power leads (feedback between different chips).
Comments on the Digitization Technique
--------------------------------------
The speaker on the PC and its associated driver circuitry is quite simple and
crude, having been designed primarily for creating single square-wave tones
of various audio frequencies. This speaker is typically driven by a pair of
transistors used as current amplifier which is in turn driven directly by the
output of a TTL gate. This results in only two possibilities of voltage
across the voice coil: 0 volts and 5 volts. Any sound to be reproduced by
this system must be reduced to an approximation in the form of a stream of
constant-amplitude, variable-width rectangular pulses.
Examination of a speech waveform on an oscilloscope display quickly tells us
that it is not going to be possible to even remotely mimic this waveform
under the above restrictions. Much of the information contained in the
waveform is in the form of amplitude variations, and this is the one
attribute we cannot reproduce. It is initially tempting to try to use the
technique of the "class D" amplifier to create the waveform, using high-speed
pulse width modulation and depending on the mechanical characteristics of the
speaker and those of the human ear to provide the missing low-pass filtering.
Assuming the sampling rate to be 8 KHz (based on the Nyquist criterion) and,
to conserve memory, assuming the samples to contain only 4 bits of amplitude
information (16 levels), we can see that data accumulates at a rate of 4k
bytes per second, which is certainly acceptable. The problem comes when we
try to play back the sound. Pulses occur at intervals of 125 microseconds,
which doesn't seem too bad, but since each pulse can have 16 possible widths,
it is necessary to time the pulses with a resolution of well under 8
microseconds. This is only a couple of instruction times on a 4.77 MHz XT,
and even on a fast 80386 it doesn't give the CPU much time between bits to
shift bits, read and increment a pointer, check the pointer to see if it's
done yet, etc., not to mention the difficulty of servicing unrelated
interrupts.
The search for simpler (but still usable) and less CPU-intensive methods of
reproducing speech leads to the question of what information in the waveform
we can discard without an unacceptable loss of intelligibility. My
experiments with running speech signals through a graphic equalizer revealed
that the lower-frequency components, those which are most visible to the eye
on the oscilloscope, are actually of minimal importance in understanding
speech. This is also demonstrated by the fact that a whisper is just as
understandable as normal speech, but does not make use of vibrating vocal
chords, which are the primary source of low-frequency components in the
voice.
The present digitizer circuit makes use of this observation by filtering out
most of the low-frequency components of the sound signal. Knowing that the
speaker cone cannot move instantaneously and serves as an approximation to
a mechanical integrator at high audio frequencies leads to the idea of
differentiating the input waveform. This accomplishes the following result:
the direction of movement of the speaker cone corresponds to the direction of
movement (derivative) of the waveform. Amplitude information is lost. As it
turns out, this is sufficiently understandable to be worth pursuing.